Global Average Land Temperature Time Series forecasting with a simple Gated Recurrent Units (GRU) Neural Network architecture, using TensorFlow, Keras and Talos.


Time Series forecasting is one of Machine Learning challenges, especially when it comes to weather parameters, as these a subject to multiple processes influence that may hinder a better learning of these sequences of values. In this piece of work we are going to illustrate the power Deep Learning in tackling a typical problem, viz., forecasting the global average land temperature using a 2-layer GRU. The dataset contains monthly temperatures records ranging from 1750, for the longest time series, to 2015.
Just like performing a Grid-/Random-Search with Scikit-Learn, a set of hyperparameters will be optimized using Talos.
In [1]:
##################################################################################################################
##*********                    Moukouba Moutoumounkata, July 2020              ***************                  ##
##                Global Land Average Temperature Time Series Forecasting                                       ##
##                *******************************************************                                       ##
##################################################################################################################



#we will try to build a reproductible experience, although without 
#guarantee, due to the extremely random nature of Neural Networks
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)

import numpy as np
import pandas as pd
from operator import itemgetter
import matplotlib.pyplot as plt
import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objs as go
import seaborn as sns

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Importing the Keras libraries and packages from TensorFlow
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Input, Dense, Dropout, LSTM, Flatten, 
                                     GRU, Bidirectional, TimeDistributed)
from tensorflow.keras.activations import relu, linear
from tensorflow.keras.optimizers import SGD, Adam, RMSprop
from tensorflow.keras.models import model_from_json
from tensorflow.keras import backend as K
import tensorflow as tf

import talos
from talos import scan, Evaluate, Reporting 
from talos.utils import early_stopper, lr_normalizer
from talos.utils.gpu_utils import parallel_gpu_jobs
from talos.utils.recover_best_model import recover_best_model

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

df = pd.read_csv("/home/moukouba/data_science/python/datasets/globaltemperatures.csv")
Using TensorFlow backend.
In [2]:
df.head()
Out[2]:
dt landaveragetemperature landaveragetemperatureuncertainty landmaxtemperature landmaxtemperatureuncertainty landmintemperature landmintemperatureuncertainty landandoceanaveragetemperature landandoceanaveragetemperatureuncertainty
0 1750-01-01 3.034 3.574 NaN NaN NaN NaN NaN NaN
1 1750-02-01 3.083 3.702 NaN NaN NaN NaN NaN NaN
2 1750-03-01 5.626 3.076 NaN NaN NaN NaN NaN NaN
3 1750-04-01 8.490 2.451 NaN NaN NaN NaN NaN NaN
4 1750-05-01 11.573 2.072 NaN NaN NaN NaN NaN NaN
In [3]:
df.shape
Out[3]:
(3192, 9)
In [4]:
df["dt"] = df.dt.apply(pd.to_datetime, errors='coerce')

df1 = df[["dt", "landaveragetemperature", "landaveragetemperatureuncertainty"]].set_index("dt")
df2 = df[["dt", "landmintemperature", "landmintemperatureuncertainty"]].set_index("dt")

print(df1.isnull().sum())
print(df2.isnull().sum())
df1.head(15)
landaveragetemperature               12
landaveragetemperatureuncertainty    12
dtype: int64
landmintemperature               1200
landmintemperatureuncertainty    1200
dtype: int64
Out[4]:
landaveragetemperature landaveragetemperatureuncertainty
dt
1750-01-01 3.034 3.574
1750-02-01 3.083 3.702
1750-03-01 5.626 3.076
1750-04-01 8.490 2.451
1750-05-01 11.573 2.072
1750-06-01 12.937 1.724
1750-07-01 15.868 1.911
1750-08-01 14.750 2.231
1750-09-01 11.413 2.637
1750-10-01 6.367 2.668
1750-11-01 NaN NaN
1750-12-01 2.772 2.970
1751-01-01 2.495 3.469
1751-02-01 0.963 3.827
1751-03-01 5.800 3.051
In [5]:
df1.shape
Out[5]:
(3192, 2)
In [6]:
df2 = df2.dropna()
df2.head()
Out[6]:
landmintemperature landmintemperatureuncertainty
dt
1850-01-01 -3.206 2.822
1850-02-01 -2.291 1.623
1850-03-01 -1.905 1.410
1850-04-01 1.018 1.329
1850-05-01 3.811 1.347
In [ ]:
 
As can be seen, the sequence has a few missing values (#12). These will be imputed using the Long Term Mean (LTM) of specific day, but not the LTM of all the series, so as not to affect the range of values that specific day may take.
In [7]:
#We define a function that replaces the missing entries with the long term mean of that specific day of the year
def imputer(df):
    #The actual dates with missing values within all the dataset
    dates_nan = pd.Series([index for index, row in df.iterrows() if row.isnull().any()])
    
    #The specific days of the year with some missing values (without duplicates)
    days_nan = dates_nan.dt.strftime('%m-%d').drop_duplicates()
    
    #The Sets of specific days with missing values
    set_with_missing = [[index for index in df.index if day in str(index)] for day in days_nan]
    
    #Now, we can replace the missings with, say, the mean/mode of each set
    for missing in dates_nan:
        for labels in set_with_missing:
            if missing in labels:
                df.loc[missing,] = df.loc[labels,].mean() 
    return df
In [ ]:
 
In [8]:
df1 = imputer(df1)

df1.head(15)
Out[8]:
landaveragetemperature landaveragetemperatureuncertainty
dt
1750-01-01 3.034000 3.574000
1750-02-01 3.083000 3.702000
1750-03-01 5.626000 3.076000
1750-04-01 8.490000 2.451000
1750-05-01 11.573000 2.072000
1750-06-01 12.937000 1.724000
1750-07-01 15.868000 1.911000
1750-08-01 14.750000 2.231000
1750-09-01 11.413000 2.637000
1750-10-01 6.367000 2.668000
1750-11-01 5.701539 0.961708
1750-12-01 2.772000 2.970000
1751-01-01 2.495000 3.469000
1751-02-01 0.963000 3.827000
1751-03-01 5.800000 3.051000
In [ ]:
 
Although the main focus of the work is to try and forecast the Land Average Temperature (LAT) only, it has been well worthwhile to pinpoint how global temperatures (average and minimum) have varied over the years, especially since 1900, in the context of Climate Change. That is, a succinct analysis of the Land Minimum Temperature (LMT) has been done accordingly. This has been done through graphical analysis of the series line plots.
In [9]:
#x1, x2, x3, x4, x5 = itemgetter(0, 1, 2, 3, 4)(np.array_split(X, 5))
x = np.array_split(df1, 3)

for data in x:
    fig = px.line(data["landaveragetemperature"])
    fig.show()
In [34]:
fig = px.line(df1["landaveragetemperatureuncertainty"])
fig.show()
In [10]:
#z1, z2, z3, z4, z5 = itemgetter(0, 1, 2, 3, 4)(np.array_split(Z, 5))

z = np.array_split(df2, 3)

for data in z:
    fig = px.line(data["landmintemperature"])
    fig.show()
Carefully looking at the above graphics, we can say that both the LAT and the LMT are indeed rising! It can be noticed that, up to 1930, the LAT was fluctuating about $14°C$; but from 1930 on, the LAT has gone well above this value, exceeding $15°C$ from 1995 onwards. Furthermore, the LMT is also on the rise, with values below $-3°C$ up to 1954; and from this year, the LMT has risen above $-3°C$ and has generally gone above $-2°C$.

Worthy of note, since the World did not have a dense network of observation stations until the last century, these values have been mostly estimated through Re-analysis, and the uncertainty curve shows that the farther we go back in time, the larger the uncertainty in estimation, with values approaching zero in recent decades. Therefore, values up to the end of 1800s should be considered with a grain of salt, as they tend to be inconsistent with the above observations.

Data Preparation

GRUs are Recurrent Neural Network models, and expect three-dimensional input with the shape [samples, timesteps, features]. The data will, thus, be reshaped accordingly. But first, we check for the consistency of the data; then through a certain number of transformations before feeding it to the model.
In [11]:
#Plotting boxplots to check for outliers

trace0 = go.Box(y=df2["landmintemperature"], name="Min_temp")
trace1 = go.Box(y=df1["landaveragetemperature"], name="Avg_temp")

sequences = [trace0, trace1]
fig = go.Figure(sequences)
fig.show()
The above boxplots may be missleading, as they show that the data is free from outliers. But a careful understanding of the variability of temperature with respect to months/seasons is enough to consider each month individually; for example, considering the month of February, we can see that it's not exempt from outliers, as shown in the graphic below. But, as stipulated above, we want to investigate the power of Deep Learing, so will we leave the algorithm to learn and make forecasts.
In [31]:
#Subsetting one month, say February, and plot the boxplots
df3 = df1[df1.index.month == 2]
df4 = df2[df2.index.month == 2]

trace0 = go.Box(y=df4["landmintemperature"], name="Min_temp")
trace1 = go.Box(y=df3["landaveragetemperature"], name="Avg_temp")

sequences = [trace0, trace1]
fig = go.Figure(sequences)
fig.show()
In [13]:
#We define a function that transforms the data to the required shape
def convert2matrix(data_arr, n_steps):
    X, y = [], []
    for i in range(len(data_arr) - n_steps):
        d = i + n_steps  
        X.append(data_arr[i:d, ])
        y.append(data_arr[d, ])
    return np.array(X), np.array(y)

#We define a function that splits the data into train, validation and test sets
def data_splitter(df, train_fraction):
    train_size = int(len(df)*train_fraction)
    train, test = df[0:train_size, ], df[train_size:len(df), ]
    return train, test
In [14]:
#Split the data into train, validation and test series
series = df1["landaveragetemperature"].values 

train, test = data_splitter(series, 0.85)
train, val = data_splitter(series, 0.75)
print(len(train), len(val), len(test))
2394 798 479
In [15]:
#Convert dataset into right shape to turn the problem into a supervised... 
#... learning one. We choose n_steps = 36, thus, we use we look back up to... 
#... 3 years (36) consecutive months to be able to forecast the next month

n_steps = 36

X_train, y_train = convert2matrix(train, n_steps)
X_val, y_val = convert2matrix(val, n_steps)
X_test, y_test = convert2matrix(test, n_steps)

#Scale the data
b_scaled = X_train.copy()
b_scaled_val = X_val.copy()
b_scaled_test = X_test.copy()

scaler = MinMaxScaler(feature_range=(0, 1))
x_train = scaler.fit_transform(b_scaled)
x_val = scaler.transform(b_scaled_val)
x_test = scaler.transform(b_scaled_test)

# reshape input to be [samples, time steps, features]
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_val = np.reshape(x_val, (x_val.shape[0], 1, x_val.shape[1]))
x_test = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))
In [16]:
b_scaled_test.shape
Out[16]:
(443, 36)
In [17]:
for i in range(5):
    print(b_scaled_test[i], y_test[i])
[ 3.031  4.517  8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66
  3.681  2.492  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862
  9.156  6.544  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364
 13.297 12.03   9.339  6.35   3.74   2.679] 2.841
[ 4.517  8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681
  2.492  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156
  6.544  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297
 12.03   9.339  6.35   3.74   2.679  2.841] 5.474
[ 8.294 10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681  2.492
  3.471  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156  6.544
  3.749  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297 12.03
  9.339  6.35   3.74   2.679  2.841  5.474] 8.455
[10.942 13.086 14.155 13.511 11.895  8.511  5.66   3.681  2.492  3.471
  5.702  8.85  11.78  13.876 14.631 14.09  11.862  9.156  6.544  3.749
  2.705  3.456  5.607  8.791 11.414 13.22  14.364 13.297 12.03   9.339
  6.35   3.74   2.679  2.841  5.474  8.455] 11.199000000000002
[13.086 14.155 13.511 11.895  8.511  5.66   3.681  2.492  3.471  5.702
  8.85  11.78  13.876 14.631 14.09  11.862  9.156  6.544  3.749  2.705
  3.456  5.607  8.791 11.414 13.22  14.364 13.297 12.03   9.339  6.35
  3.74   2.679  2.841  5.474  8.455 11.199] 13.487
Now, we build a basic GRU neural network, with two hidden layers and one dropout-regularization layer. Two metrics (the Mean Absolute Error and the Coefficient of Determination (r_squared)), will be used.
In [18]:
# Setting the dictionary of the hyperparameter to be included in the optimisation process 
p = {'epochs': (20, 150, 5),
     'neurons1': [ 32, 64, 128, 256],
     'neurons2': [32, 64, 128, 256],
     'dropout': (0.1, 0.6, 5),
     'loss': ['mse', 'mae'],
     'activation1':[relu, None,],
     'activation2':[linear, None,],
     'batch_size': [8, 16, 32, 64, 128],
     'optimizer': ['Adam', 'SGD', 'RMSprop']
     }
In [19]:
#Define custom coefficient of determination metric
def r_squared(y_true, y_pred):
    SS_res =  K.sum(K.square(y_true-y_pred)) 
    SS_tot = K.sum(K.square(y_true - K.mean(y_true))) 
    return (1 - SS_res/(SS_tot + K.epsilon()))


#We define a model builder function. We wrap the first hidden layer into a Bidirectional layer 
def model_builder(x_train, y_train, x_val, y_val, params, n_steps = 36, n_features = 1):
    
    tf.keras.backend.clear_session()
    
    model = Sequential([
        Bidirectional(GRU(params['neurons1'], return_sequences=True, 
                          activation=params['activation1'], input_shape=(n_features, n_steps))),
        GRU(params['neurons2'], activation=params['activation1']),
        Dropout(params['dropout']),
        
        Dense(1, activation=params['activation2'])        
    ])
    
    model.compile(optimizer=params['optimizer'], loss=params['loss'], metrics=['mae', r_squared])
    
    history = model.fit(x_train, y_train, epochs=params['epochs'], batch_size=params['batch_size'],  verbose=0, 
                        validation_data=[x_val, y_val], callbacks=[early_stopper(epochs=params['epochs'], 
                                                                                 mode='moderate', monitor='val_loss')])
    
    return history, model
Next, we run the experiments creating a Scan object (scrutinizer) and split the GPU memory in two for two parallel jobs; and for the sake of computational burden, only 0.1% of the hyperparameter space, randomly downsampled, will be used - that is, 48 rounds - and the process took 10 minutes and 34 seconds to execute, on a modest GTX-1050 GPU.
In [21]:
parallel_gpu_jobs(0.5)

scrutinizer = talos.Scan(x=x_train, y=y_train, x_val=x_test, y_val=y_test, seed=42, 
                         model=model_builder, experiment_name='time_series__gru_hpo_01', 
                         params=p,fraction_limit=0.001, reduction_metric='val_loss')
  0%|          | 0/48 [00:00<?, ?it/s]
  2%|▏         | 1/48 [00:05<04:29,  5.73s/it]
  4%|▍         | 2/48 [00:12<04:35,  5.99s/it]
  6%|▋         | 3/48 [00:18<04:31,  6.03s/it]
  8%|▊         | 4/48 [00:26<04:45,  6.49s/it]
 10%|█         | 5/48 [00:36<05:24,  7.55s/it]
 12%|█▎        | 6/48 [00:45<05:42,  8.15s/it]
 15%|█▍        | 7/48 [01:10<08:57, 13.12s/it]
 17%|█▋        | 8/48 [01:20<08:13, 12.33s/it]
 19%|█▉        | 9/48 [01:28<07:11, 11.06s/it]
 21%|██        | 10/48 [01:40<07:12, 11.37s/it]
 23%|██▎       | 11/48 [01:46<05:52,  9.52s/it]
 25%|██▌       | 12/48 [01:52<05:03,  8.44s/it]
 27%|██▋       | 13/48 [02:02<05:15,  9.01s/it]
 29%|██▉       | 14/48 [02:06<04:20,  7.65s/it]
 31%|███▏      | 15/48 [02:12<03:54,  7.12s/it]
 33%|███▎      | 16/48 [02:27<04:58,  9.34s/it]
 35%|███▌      | 17/48 [03:48<15:55, 30.82s/it]
 38%|███▊      | 18/48 [04:11<14:16, 28.56s/it]
 40%|███▉      | 19/48 [04:27<12:01, 24.88s/it]
 42%|████▏     | 20/48 [04:33<08:51, 18.97s/it]
 44%|████▍     | 21/48 [04:57<09:18, 20.68s/it]
 46%|████▌     | 22/48 [05:04<07:10, 16.55s/it]
 48%|████▊     | 23/48 [05:17<06:22, 15.31s/it]
 50%|█████     | 24/48 [05:22<04:59, 12.46s/it]
 52%|█████▏    | 25/48 [05:28<04:01, 10.49s/it]
 54%|█████▍    | 26/48 [06:03<06:30, 17.76s/it]
 56%|█████▋    | 27/48 [06:49<09:10, 26.22s/it]
 58%|█████▊    | 28/48 [07:00<07:11, 21.58s/it]
 60%|██████    | 29/48 [07:06<05:20, 16.87s/it]
 62%|██████▎   | 30/48 [07:14<04:17, 14.32s/it]
 65%|██████▍   | 31/48 [07:18<03:11, 11.28s/it]
 67%|██████▋   | 32/48 [07:27<02:50, 10.64s/it]
 69%|██████▉   | 33/48 [07:33<02:15,  9.05s/it]
 71%|███████   | 34/48 [07:39<01:54,  8.16s/it]
 73%|███████▎  | 35/48 [07:54<02:14, 10.32s/it]
 75%|███████▌  | 36/48 [08:12<02:31, 12.65s/it]
 77%|███████▋  | 37/48 [08:27<02:25, 13.21s/it]
 79%|███████▉  | 38/48 [08:58<03:07, 18.77s/it]
 81%|████████▏ | 39/48 [09:08<02:24, 16.03s/it]
 83%|████████▎ | 40/48 [09:14<01:43, 12.92s/it]
 85%|████████▌ | 41/48 [09:21<01:18, 11.27s/it]
 88%|████████▊ | 42/48 [09:26<00:56,  9.47s/it]
 90%|████████▉ | 43/48 [09:34<00:44,  8.97s/it]
 92%|█████████▏| 44/48 [09:52<00:46, 11.65s/it]
 94%|█████████▍| 45/48 [10:12<00:42, 14.16s/it]
 96%|█████████▌| 46/48 [10:16<00:22, 11.19s/it]
 98%|█████████▊| 47/48 [10:27<00:10, 10.98s/it]
100%|██████████| 48/48 [10:34<00:00, 13.22s/it]
In [ ]:
 
In [24]:
scrutinizer.details
Out[24]:
experiment_name        time_series__gru_hpo_01
random_method                 uniform_mersenne
reduction_method                          None
reduction_interval                          50
reduction_window                            20
reduction_threshold                        0.2
reduction_metric                      val_loss
complete_time                   07/17/20/05:43
x_shape                          (2358, 1, 36)
y_shape                                (2358,)
dtype: object
In [25]:
analyze_object = talos.Analyze(scrutinizer)
analyze_object.data
Out[25]:
start end duration round_epochs loss mean_absolute_error r_squared val_loss val_mean_absolute_error val_r_squared activation1 activation2 batch_size dropout epochs loss neurons1 neurons2 optimizer
0 07/17/20-053235 07/17/20-053241 5.444569 5 1.823407 1.021466 0.897185 1.375936 1.108128 0.910941 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 16 0.2 20 mse 64 32 SGD
1 07/17/20-053241 07/17/20-053247 6.349154 4 1.102617 1.102618 0.858764 0.309115 0.309115 0.989909 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 8 0.4 20 mae 64 128 Adam
2 07/17/20-053247 07/17/20-053253 5.943774 4 0.833030 0.833030 0.916693 1.179583 1.179583 0.869796 None None 8 0.2 20 mae 32 128 RMSprop
3 07/17/20-053254 07/17/20-053301 7.381340 13 0.892367 0.892367 0.928306 0.850955 0.850955 0.941563 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 64 0.2 46 mae 128 256 RMSprop
4 07/17/20-053301 07/17/20-053311 9.839658 3 1.426376 0.906546 0.904746 0.779794 0.806560 0.939550 None None 8 0.1 20 mse 256 64 RMSprop
5 07/17/20-053311 07/17/20-053321 9.349173 16 1.023414 0.733189 0.944160 0.313052 0.446797 0.979995 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 32 0.2 98 mse 64 256 Adam
6 07/17/20-053321 07/17/20-053345 24.510737 14 1.117018 0.787002 0.928426 0.227748 0.372953 0.982543 <function relu at 0x7f85eb4d27a0> None 8 0.1 124 mse 128 128 RMSprop
7 07/17/20-053345 07/17/20-053356 10.299137 8 1.557129 0.946037 0.905404 1.843515 1.289751 0.876865 None <function linear at 0x7f85eb4d2a70> 16 0.4 46 mse 128 128 RMSprop
8 07/17/20-053356 07/17/20-053404 7.886449 18 1.130068 0.783006 0.940711 0.224861 0.384933 0.985562 None <function linear at 0x7f85eb4d2a70> 32 0.3 46 mse 64 128 Adam
9 07/17/20-053404 07/17/20-053416 11.889949 17 1.012023 1.012023 0.897245 0.506639 0.506639 0.976009 <function relu at 0x7f85eb4d27a0> None 16 0.3 72 mae 128 64 SGD
10 07/17/20-053416 07/17/20-053421 5.014678 8 1.096801 0.759404 0.942120 0.431419 0.567642 0.974346 None None 64 0.3 46 mse 128 256 SGD
11 07/17/20-053421 07/17/20-053427 5.745974 28 1.345578 1.345578 0.830237 0.335631 0.335631 0.989812 <function relu at 0x7f85eb4d27a0> None 64 0.3 98 mae 64 32 RMSprop
12 07/17/20-053427 07/17/20-053437 10.147461 24 0.893910 0.667665 0.953362 0.148321 0.303346 0.991034 None None 64 0.2 124 mse 128 256 Adam
13 07/17/20-053438 07/17/20-053442 4.282582 8 3.592751 1.405758 0.804835 1.504428 1.161462 0.908490 None None 32 0.5 20 mse 32 32 RMSprop
14 07/17/20-053442 07/17/20-053448 5.692707 43 0.980577 0.713081 0.949255 0.192618 0.354703 0.988320 None None 128 0.1 72 mse 64 64 Adam
15 07/17/20-053448 07/17/20-053502 14.322088 17 2.010268 1.082673 0.896172 0.298981 0.457450 0.982221 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 64 0.5 98 mse 256 256 RMSprop
16 07/17/20-053502 07/17/20-053623 80.734927 34 1.007710 0.733561 0.933487 0.341570 0.488623 0.973860 <function relu at 0x7f85eb4d27a0> None 8 0.2 124 mse 128 256 RMSprop
17 07/17/20-053623 07/17/20-053647 23.106701 22 1.246300 1.246300 0.820996 0.459337 0.459337 0.975420 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 8 0.5 124 mae 128 32 SGD
18 07/17/20-053647 07/17/20-053703 16.100532 18 1.654664 0.980509 0.897872 0.347724 0.492978 0.977857 <function relu at 0x7f85eb4d27a0> None 16 0.5 98 mse 64 256 Adam
19 07/17/20-053703 07/17/20-053708 4.979184 19 1.267682 1.267682 0.851950 0.584440 0.584440 0.970892 None None 64 0.4 72 mae 128 32 SGD
20 07/17/20-053708 07/17/20-053733 24.483303 29 2.312648 1.134555 0.850173 1.012077 0.950686 0.923952 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 8 0.2 124 mse 64 32 RMSprop
21 07/17/20-053733 07/17/20-053740 6.728299 39 0.968470 0.708555 0.949128 0.151754 0.314404 0.991006 None <function linear at 0x7f85eb4d2a70> 128 0.1 98 mse 128 64 Adam
22 07/17/20-053740 07/17/20-053752 12.214292 13 0.825033 0.825033 0.918555 0.301964 0.301964 0.959571 None None 8 0.2 46 mae 32 64 Adam
23 07/17/20-053752 07/17/20-053758 5.623375 15 0.888510 0.888510 0.925293 0.535114 0.535114 0.975411 None <function linear at 0x7f85eb4d2a70> 32 0.3 46 mae 32 64 Adam
24 07/17/20-053758 07/17/20-053804 5.706825 7 0.850787 0.850787 0.906506 0.987030 0.987030 0.930383 None None 16 0.3 46 mae 32 128 RMSprop
25 07/17/20-053804 07/17/20-053838 34.530986 40 0.761421 0.761421 0.924081 0.706056 0.706056 0.951396 None <function linear at 0x7f85eb4d2a70> 8 0.2 98 mae 64 64 Adam
26 07/17/20-053839 07/17/20-053924 45.772839 21 1.363951 0.863160 0.910766 0.459370 0.587841 0.962096 <function relu at 0x7f85eb4d27a0> None 8 0.5 72 mse 256 128 SGD
27 07/17/20-053925 07/17/20-053935 10.541960 50 1.202067 0.817629 0.938789 0.156858 0.320376 0.990209 None None 128 0.3 124 mse 64 256 RMSprop
28 07/17/20-053935 07/17/20-053941 5.680723 14 1.219909 1.219909 0.866566 0.420560 0.420560 0.984135 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 64 0.5 72 mae 128 128 Adam
29 07/17/20-053941 07/17/20-053949 8.187706 17 1.428998 1.428998 0.817462 0.924580 0.924580 0.940744 None <function linear at 0x7f85eb4d2a70> 64 0.5 98 mae 256 32 Adam
30 07/17/20-053950 07/17/20-053954 3.961443 4 0.917160 0.917160 0.920973 0.388453 0.388453 0.985928 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 32 0.2 20 mae 64 128 RMSprop
31 07/17/20-053954 07/17/20-054003 8.954482 50 0.953082 0.953082 0.917153 0.433025 0.433025 0.982998 <function relu at 0x7f85eb4d27a0> None 64 0.3 124 mae 32 128 Adam
32 07/17/20-054003 07/17/20-054008 5.155778 19 1.146195 1.146195 0.881746 0.683873 0.683873 0.963199 None None 64 0.5 72 mae 128 64 SGD
33 07/17/20-054008 07/17/20-054014 5.882210 8 2.657938 1.232522 0.851262 0.303648 0.478423 0.979776 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 16 0.2 46 mse 64 32 Adam
34 07/17/20-054014 07/17/20-054029 15.161357 23 0.841416 0.841416 0.926709 0.370103 0.370103 0.986072 <function relu at 0x7f85eb4d27a0> None 16 0.3 124 mae 128 128 SGD
35 07/17/20-054030 07/17/20-054048 17.896612 22 1.251668 0.834618 0.930665 0.763941 0.800764 0.948791 None <function linear at 0x7f85eb4d2a70> 16 0.3 124 mse 128 128 RMSprop
36 07/17/20-054048 07/17/20-054102 14.330067 5 2.390909 1.184087 0.843497 0.229007 0.399866 0.982998 None None 8 0.3 20 mse 256 32 RMSprop
37 07/17/20-054102 07/17/20-054134 31.527902 15 0.821433 0.821433 0.919384 1.153121 1.153121 0.878962 None None 8 0.1 124 mae 256 32 Adam
38 07/17/20-054134 07/17/20-054143 9.420665 9 2.073977 1.100979 0.889724 1.466246 1.161794 0.909357 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 32 0.2 46 mse 256 64 RMSprop
39 07/17/20-054144 07/17/20-054149 5.487354 25 2.687771 1.235235 0.860198 0.435329 0.585492 0.973614 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 64 0.4 124 mse 32 64 RMSprop
40 07/17/20-054149 07/17/20-054157 7.203435 12 1.644385 0.970049 0.913994 0.625195 0.704134 0.962685 None None 64 0.4 72 mse 256 64 SGD
41 07/17/20-054157 07/17/20-054202 5.061798 6 0.871386 0.871386 0.923553 0.495110 0.495110 0.976275 None <function linear at 0x7f85eb4d2a70> 16 0.2 20 mae 64 64 SGD
42 07/17/20-054202 07/17/20-054210 7.633782 34 2.672152 1.226103 0.859911 0.128604 0.283822 0.992150 None None 64 0.4 98 mse 128 32 Adam
43 07/17/20-054210 07/17/20-054228 17.695319 24 0.789364 0.789364 0.938743 1.031510 1.031510 0.925336 None <function linear at 0x7f85eb4d2a70> 32 0.3 124 mae 128 256 RMSprop
44 07/17/20-054228 07/17/20-054248 19.825017 25 0.800355 0.800355 0.934585 0.561869 0.561869 0.969780 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 16 0.3 72 mae 32 256 Adam
45 07/17/20-054248 07/17/20-054252 4.062494 17 0.825227 0.825227 0.936489 0.443254 0.443254 0.982269 None None 128 0.2 20 mae 64 128 SGD
46 07/17/20-054252 07/17/20-054302 10.313258 13 1.343467 0.855257 0.928763 0.460792 0.596321 0.971121 None <function linear at 0x7f85eb4d2a70> 32 0.5 72 mse 256 128 SGD
47 07/17/20-054303 07/17/20-054309 6.802013 4 2.000393 1.067152 0.870234 1.746786 1.212220 0.877057 <function relu at 0x7f85eb4d27a0> <function linear at 0x7f85eb4d2a70> 8 0.5 20 mse 128 128 SGD
In [ ]:
 
In [26]:
# The lowest root mean squared error achieved on validation set 
best_r_squared = analyze_object.high('val_r_squared')
# The lowest mean absolute error achieved on validation set 
best_mae = analyze_object.low('val_mean_absolute_error')

print("The scores for rmse and mae, on validation set are %.5f and %.5f, respectively."%(best_r_squared, best_mae))
The scores for rmse and mae, on validation set are 0.99215 and 0.28382, respectively.
In [ ]:
 
In [27]:
# The best models based on respective metrics
best1 = scrutinizer.best_model(metric='val_r_squared', asc=False)
best2 = scrutinizer.best_model(metric='val_mean_absolute_error', asc=False)

# Predicting the Test set results
y_pred_rsq = best1.predict(x_test)
y_pred_mae = best2.predict(x_test)
r2_1 = r2_score(y_test, y_pred_rsq)
r2_2 = r2_score(y_test, y_pred_mae)
print('R-squared for r_squared and mae -based metrics, on test set, are %.5f and %.5f, respectively.'%(r2_1, r2_2))

#
y0 = y_test.flatten()
y1 = y_pred_rsq.flatten()
y2 = y_pred_mae.flatten()
results = pd.DataFrame({"y_test":y0, "y_pred_rsq":y1, "y_pred_mae":y2})

results.tail(20)
R-squared for r_squared and mae -based metrics, on test set, are 0.99245 and 0.89174, respectively.
Out[27]:
y_test y_pred_rsq y_pred_mae
423 12.582 11.967209 10.463826
424 14.335 14.337593 12.809447
425 14.873 15.325876 13.855522
426 14.875 14.710522 13.408031
427 13.091 12.773219 11.627477
428 10.330 10.032082 8.972277
429 6.713 7.151533 6.041293
430 4.850 4.461391 3.604507
431 3.881 3.452723 2.497569
432 4.664 4.035725 2.999849
433 6.740 6.292898 5.003578
434 9.313 9.272816 7.778694
435 12.312 12.030505 10.531526
436 14.505 14.208534 12.676332
437 15.051 15.166251 13.736426
438 14.755 14.629040 13.316429
439 12.999 12.789017 11.614746
440 10.801 10.081911 8.993826
441 7.433 7.287759 6.131873
442 5.518 4.696044 3.818761
In [ ]:
 
In [28]:
xpoints = df.iloc[-443:, 0]

trace0 = go.Scatter(x=xpoints, y=y0, name='Actual Values')
trace1 = go.Scatter(x=xpoints, y=y1, name='Predicted (RSQ)')
trace2 = go.Scatter(x=xpoints, y=y2, name='Predicted (MAE)')

data = [trace0, trace1, trace2]

layout=go.Layout(title="Actual and Predicted Temperatures", xaxis={'title':'Year'}, yaxis={'title':'Temprature'})
fig = go.Figure(data=data, layout=layout)
fig.show()
In [29]:
# We can now save the best model for further use or deployment

# Get the best model index based on the highest 'validation ROC_AUC' 
model_id = analyze_object.data[['val_r_squared']].idxmax()[0]

# Clear any previous TensorFlow session.
tf.keras.backend.clear_session()

# Load the model parameters from the scanner.
model = model_from_json(scrutinizer.saved_models[model_id])
model.set_weights(scrutinizer.saved_weights[model_id])
model.summary()
model.save('./avg_temp_best_model.h5')
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional (Bidirectional multiple                  126720    
_________________________________________________________________
gru_1 (GRU)                  multiple                  27744     
_________________________________________________________________
dropout (Dropout)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  33        
=================================================================
Total params: 154,497
Trainable params: 154,497
Non-trainable params: 0
_________________________________________________________________

Conclusion:


As can be seen, a GRU achieves a mind-blowing Coefficiencient of Determination (r_squared) of 99% on completely new data (test set). The curve (red line), of the predicted values perfectly mimics the actual values curve (blue line), almost completely shading it.
In [ ]: